Integration of talking heads and text-to-speech synthesizers for visual TTS

نویسندگان

Jörn Ostermann

Marc C. Beutnagel

Ariel Fischer

Yao Wang

چکیده

The integration of text-to-speech (TTS) synthesis and animation of synthetic faces allows new applications like visual human computer interfaces using agents or avatars. The TTS informs the talking head when phonemes are spoken. The appropriate mouth shapes are animated and rendered while the TTS produces the sound. We call this integrated system of TTS and animation a Visual TTS (VTTS). This paper describes the architecture on an integrated VTTS synthesizer that allows defining facial expressions as bookmarks in the text that will be animated while the model is talking. The position of a bookmark in the text defines the start time for the facial expression. The bookmark itself names the expression, its amplitude and the duration during which the amplitude has to be reached by the face. A bookmark to face animation parameter (FAP) converter creates a curve defining the amplitude for the given FAP over time using Hermite functions of 3 order.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal Speech Synthesis

Multimodal Speech Synthesis (’<Talking Heads”) encompasses synthesis of speech from text (“Text-toSpeech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“Visual TTS”, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthr...

متن کامل

Generation of Personalized MPEG-4 compliant Talking Heads

This paper studies a new method for three-dimensional (3D) facial model adaptation and its integration into a Text-to-Speech (TTS) system. The TTS System pronounces, in real time, English or Greek speech and simultaneously animates the adapted face model, thus simulating a natural talking face. The 3D facial adaptation requires a set of two orthogonal views of the user’s face with a number of f...

متن کامل

A comparison of German talking heads in a smart home environment

The authors describe a newly developed German Text-Toaudiovisual-Speech (TTavS) synthesis system based on the English speaking HeadZero. Targets of the control parameters of the talking head are generated by mapping of German phonemes to the originally English visemic blend shapes controls. The resulting German version of HeadZero and the German talking head MASSY were extended to generate audi...

متن کامل

Real-time streaming for the animation of talking faces in multiuser environments

In order to enable face animation on the Internet using high quality synthetic speech, the Text-to-Speech (TTS) servers need to be implemented on network-based servers and shared by many users. The output of a TTS server is used to animate talking heads as defined in MPEG-4. The TTS server creates two sets of data: audio data and Phonemes with optional Facial Animation Parameters (FAP) like smi...

متن کامل

Photo-Realistic Talking-Heads from Image Samples

This paper describes a system for creating a photo-realistic model of the human head that can be animated and lip-synched from phonetic transcripts of text. Combined with a state-of-the-art text-to-speech synthesizer (TTS), it generates video animations of talking heads that closely resemble real people. To obtain a naturally looking head, we choose a “data-driven” approach. We record a talking...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Integration of talking heads and text-to-speech synthesizers for visual TTS

نویسندگان

چکیده

منابع مشابه

Multimodal Speech Synthesis

Generation of Personalized MPEG-4 compliant Talking Heads

A comparison of German talking heads in a smart home environment

Real-time streaming for the animation of talking faces in multiuser environments

Photo-Realistic Talking-Heads from Image Samples

عنوان ژورنال:

اشتراک گذاری